Multilingual Hybrid Text Processing in Ancient Uighur (Chaghatai) Digitalized System

نویسنده

  • Dilmurat Tursun
چکیده

This research mainly considers and discusses system codepage in special techniques to multilingual processing of ancient Uighur literatures (Chagatai for abbreviation in the following text). Based on detailed analysis to Arabic code page, Farsi codepage and Uighur codepage in Unicode standard, we presented a codepage and keyboard layout, which is compatible with Chaghatai, Arabic, Farsi, Uighur and Latin characters, is proposed. It is a key technique for achieving specialized Chaghatai word processing systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Three-Step Model of Language Detection in Multilingual Ancient Texts

Ancient corpora contain various multilingual patterns. This imposes numerous problems on their manual annotation and automatic processing. We introduce a lexicon building system, called Lexicon Expander, that has an integrated language detection module, Language Detection (LD) Toolkit. The Lexicon Expander post-processes the output of the LD Toolkit which leads to the improvement of f-score and...

متن کامل

Rule-based Person Name Recognition for Xinjiang Minority Languages

Xinjiang multi-nationality name entity recognition is an important part in multi-language processing. In this paper, we analyze the patterns of Uighur and Kazak person names, and perform the name identity recognition using rule-based approach. We also propose and implement the rules for Uighur and Kazak word segmentation.

متن کامل

Unsupervised multilingual learning

For centuries, scholars have explored the deep links among human languages. In this thesis, we present a class of probabilistic models that exploit these links as a form of naturally occurring supervision. These models allow us to substantially improve performance for core text processing tasks, such as morphological segmentation, part-of-speech tagging, and syntactic parsing. Besides these tra...

متن کامل

Accurate Collocation Extraction Using a Multilingual Parser

This paper focuses on the use of advanced techniques of text analysis as support for collocation extraction. A hybrid system is presented that combines statistical methods and multilingual parsing for detecting accurate collocational information from English, French, Spanish and Italian corpora. The advantage of relying on full parsing over using a traditional window method (which ignores the s...

متن کامل

A Multimodal Framework for the Recognition of Ancient Tamil Handwritten Characters in Palm Manuscript Using Boolean Bitmap Pattern of Image Zoning

Tamil is one of the oldest languages in the world with rich literature. In the ancient days, the writers, especially in Tamilnadu, used palm leaves to encrypt their writing. A very good example of the usage of Palm leaf manuscripts to store the history is Tamil grammar book named Tolkappiyam which was written during 4th B.C. The ancient literature includes many palm leaf manuscripts that contai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Chinese Language and Computing

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2005